A new approach for phoneme segmentation of speech signals

نویسندگان

  • Ladan Golipour
  • Douglas D. O'Shaughnessy
چکیده

In this paper, we present a new method for segmenting speech at the phoneme level. For this purpose, we use the short-time Fourier transform of the speech signal. The goal is to recognize the locations of main energy changes in frequency over time, which can be described as phoneme boundaries. We apply a sub-band analysis and search for energy changes in individual bands as well to obtain further precision. Moreover, we employ the modified group-delay function to achieve a more clear representation of the locations of boundaries, and smooth out the undesired fluctuations of the signal. We also study the use of an auditory spectrogram instead of a regular spectrogram in the segmentation process. Since this method merely utilizes the power spectrum of the signal for segmentation, there is no need for any adaptation of the parameters or training for different speakers in advance. In addition, no transcript information such as the phonemes themselves or voiced/unvoiced decision making is required. The method was tested over the phoneticallydiverse part of the Timit database, and the results show that 87% of the boundaries are successfully recognized.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Phoneme Segmentation Using Transformed Cepstrum Features

One of the basic problems in speech engineering is phoneme segmentation, that is, to divide a speech stream into a string of phonemes. Automatic Speech Recognition (ASR) models often require reliable phoneme segmentation in the initial training phase, and Text-to-Speech (TTS) systems need a large speech database with correct phoneme segmentation information for improving the performance. Human ...

متن کامل

Statistical corpus-based speech segmentation

An automatic speech segmentation technique is presented that is based on the alignment of a target speech signal with a set of different reference speech signals generated by a specific designed corpus-based speech synthesis system that additionally generates phoneme boundary markers. Each reference signal is then warped to the target speech signal. By synthesizing and warping many different re...

متن کامل

A New Text-Independent Method for Phoneme Segmentation

A new approach for text-independent speech segmentation is proposed. The novelty consists in a preprocessing based on critical-band perceptual analysis and an original algorithm for the individuation of phoneme boundaries. The results are promising since the method gives 74% of correct segmentation without presenting over-segmentation.

متن کامل

Speech/Non-Speech Segmentation Based on Phoneme Recognition Features

This work assesses different approaches for speech and non-speech segmentation of audio data and proposes a new, high-level representation of audio signals based on phoneme recognition features suitable for speech/non-speech discrimination tasks. Unlike previous model-based approaches, where speech and non-speech classes were usually modeled by several models, we develop a representation where ...

متن کامل

Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain

This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...

متن کامل

A Time-Frequency approach for EEG signal segmentation

The record of human brain neural activities, namely electroencephalogram (EEG), is generally known as a non-stationary and nonlinear signal. In many applications, it is useful to divide the EEGs into segments within which the signals can be considered stationary. Combination of empirical mode decomposition (EMD) and Hilbert transform, called Hilbert-Huang transform (HHT), is a new and powerful ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007